Search CORE

40 research outputs found

Extending the EmotiNet Knowledge Base to Improve the Automatic Detection of Implicitly Expressed Emotions from Text

Author: BALAHUR DOBRESCU ALEXANDRA
Publication venue: European Language Resources Association
Publication date: 01/02/2012
Field of study

Sentiment analysis is one of the recent, highly dynamic fields in Natural Language Processing. Most existing approaches are based on word-level analysis of texts and are mostly able to detect only explicit expressions of sentiment. However, in many cases, emotions are not expressed by using words with an affective meaning (e.g. happy), but by describing real-life situations, which readers (based on their commonsense knowledge) detect as being related to a specic emotion. Given the challenges of detecting emotions from contexts in which no lexical clue is present, in this article we present a comparative analysis between the performance of well-established methods for emotion detection (supervised and lexical knowledge-based) and a method we propose and extend, which is based on commonsense knowledge stored in the EmotiNet knowledge base. Our extensive evaluations show that, in the context of this task, the approach based on EmotiNet is the most appropriate.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Detecting Event-Related Links and Sentiments from Social Media Texts

Author: BALAHUR DOBRESCU ALEXANDRA
TANEV Hristo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 22/05/2013
Field of study

Nowadays, the importance of Social Media is constantly growing, as people often use such platforms to share mainstream media news and comment on the events that they relate to. As such, people no loger remain mere spectators to the events that happen in the world, but become part of them, commenting on their developments and the entities involved, sharing their opinions and distributing related content. This paper describes a system that links the main events detected from clusters of newspaper articles to tweets related to them, detects complementary information sources from the links they contain and subsequently applies sentiment analysis to classify them into positive, negative and neutral. In this manner, readers can follow the main events happening in the world, both from the perspective of mainstream as well as social media and the public's perception on them. This system is part of a media monitoring framework working live and it will be demonstrated using Google Earth.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Definición de disparador de emoción asociado a la cultura y aplicación a la clasificación de la valencia y la emoción en textos

Author: Balahur Dobrescu Alexandra
Montoyo Andres
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2008
Field of study

Este artículo presenta un método de identificación y clasificación de la valencia y las emociones presentes en un texto. Para ello, se introduce un nuevo concepto denominado disparador de emoción. Inicialmente, se construye de forma incremental una base de datos léxica de disparadores de emoción asociados a la cultura con la que se quiere trabajar, basándose en tres teorías diferentes: la Teoría de la Relevancia de Pragmática, la Teoría de la Motivación de Maslow de Psicología y la Teoría de Necesidades de Neef de Economía. La base de datos creada parte de un conjunto inicial de términos y es ampliada con la información de otros recursos léxicos, como WordNet, NomLex y dominios relevantes. El enlace entre idiomas se hace por medio de EuroWordNet y se completa y adapta a diversas culturas con bases de conocimiento específicas para cada lengua. También, se demuestra cómo la base de datos construida puede ser utilizada para buscar en textos la valencia (polaridad) y el significado afectivo. Finalmente, se evalúa el método utilizando los datos de prueba de la tarea nº 14 de Semeval “Texto afectivo” y su traducción al español. Los resultados y las mejoras se presentan junto con una discusión en la que se tratan los puntos fuertes y débiles del método y las directrices para el trabajo futuro.This paper presents a method to automatically spot and classify the valence and emotions present in written text, based on a concept we introduced - of emotion triggers. The first step consists of incrementally building a culture dependent lexical database of emotion triggers, emerging from the theory of relevance from pragmatics, Maslow´s theory of human needs from psychology and Neef´s theory of human needs in economics. We start from a core of terms and expand them using lexical resources such as WordNet, completed by NomLex, sense number disambiguated using the Relevant Domains concept. The mapping among languages is accomplished using EuroWordNet and the completion and projection to different cultures is done through language-specific commonsense knowledge bases. Subsequently, we show the manner in which the constructed database can be used to mine texts for valence (polarity) and affective meaning. An evaluation is performed on the Semeval Task No. 14: Affective Text test data and their corresponding translation to Spanish. The results and improvements are presented together with an argument on the strong and weak points of the method and the directions for future work

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Going beyond traditional QA systems: challenges and keys in opinion question answering

Author: Balahur Dobrescu Alexandra
Boldrini Ester
Martínez-Barco Patricio
Montoyo Andres
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

The treatment of factual data has been widely studied in different areas of Natural Language Processing (NLP). However, processing subjective information still poses important challenges. This paper presents research aimed at assessing techniques that have been suggested as appropriate in the context of subjective - Opinion Question Answering (OQA). We evaluate the performance of an OQA with these new components and propose methods to optimally tackle the issues encountered. We assess the impact of including additional resources and processes with the purpose of improving the system performance on two distinct blog datasets. The improvements obtained for the different combination of tools are statistically significant. We thus conclude that the proposed approach is adequate for the OQA task, offering a good strategy to deal with opinionated questions.This paper has been partially supported by Ministerio de Ciencia e Innovación - Spanish Government (grant no. TIN2009-13391-C04-01), and Conselleria d'Educación - Generalitat Valenciana (grant no. PROMETEO/2009/119 and ACOMP/2010/286)

Repositorio Institucional de la Universidad de Alicante

CiteSeerX

Understanding Citizens' Vulnerabilities to Disinformation and Data-Driven Propaganda

Author: BALAHUR-DOBRESCU ALEXANDRA
FLORE MASSIMO
PODAVINI ALDO
VERILE MARCO
Publication venue: 'Publications Office of the European Union'
Publication date: 11/04/2019
Field of study

Disinformation strategies have evolved from “hack and dump” cyber-attacks, and randomly sharing conspiracy or made-up stories, into a more complex ecosystem where narratives are used to feed people with emotionally charged true and false information, ready to be “weaponised” when necessary. Manipulated information, using a mix of emotionality and rationality, has recently become so pervasive and powerful to the extent of rewriting reality, where the narration of facts (true, partial or false) counts more than the facts themselves. Every day, an incredible amount of information is constantly produced on the web. Its diffusion is driven by algorithms, originally conceived for the commercial market, and then maliciously exploited for manipulative purposes and to build consensus. Citizens' vulnerability to disinformation operations is not only the result of the threats posed by hostile actors or psychometric profiling - which can be seen as both exploiters and facilitators - but essentially due to the effect of three different factors: Information overload; Distorted public perceptions produced by online platforms algorithms built for viral advertising and user engagement; The complex iteration of fast technology development, globalisation, and post-colonialism, which have rapidly changed the rules-based international order. In rapidly and dynamically evolving environments, increasing citizens' resilience against malicious attacks is, ultimately, of paramount importance to protect our open democratic societies, social values and individual rights and freedoms.JRC.E.7-Knowledge for Security and Migratio

JRC Publications Repository

IBEREVAL OM: Minería de opiniones en los nuevos géneros textuales

Author: Balahur Dobrescu Alexandra
Boldrini Ester
Martínez-Barco Patricio
Montoyo Andres
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2010
Field of study

The increasing amount of subjective data on the Web is creating the need to develop effective Question Answering systems able to discriminate such information from factual data, and subsequently process it with specific methods. The participants in the IBEREVAL OM tasks will be given a set of opinion questions (in Spanish and English). Optionally, they will also be able to receive the same set of opinion questions, in which the source, target and expected polarity, as well as the time span the question is referring to are given. They will also be provided with a collection of blog posts, extracted using the Technorati blog search engine (in Spanish and English), in which the answers to the opinion questions should be found The gold standard for this blog posts collection will previously be annotated using the EmotiBlog scheme, by a number of 3 annotators. The EmotiBlog corpus and the set of questions presented in (Balahur et al., 2009) – in their present state will be provided for system training. The participants will be able to participate in two subtasks : 1) in the first one, they will be asked to provide the list of answers to each of the questions (in the same language as the questions, or in the other language); 2) in the second one, they will be asked to provide a summary of the question answers – the top x% of the most important answers, in a non-redundant manner. The Gold Standard for the summaries will be automatically extracted from the manual annotations, taking into account the “intensity” parameter of the opinions expressed.Con el grande aumento de la información subjetiva en la Web, hay una importante necesidad de desarrollar sistemas de Question Answering que sean eficientes y capaces de discriminar entre datos objetivos y subjetivos. Los participantes tendrán una colección de preguntas de opinión (Español e Inglés) en las cuales se deberán encontrar las respuestas. El Gold Standard será anotado previamente con el esquema de anotación EmotiBlog por 3 anotadores. El corpus EmotiBlog y la colección de preguntas presentados en (Balahur et al. 2009) se pondrá a disposición para el entrenamiento del sistema. Los participantes deberán devolver un listado de respuestas para cada una de las preguntas, (en el mismo idioma que la pregunta o en otro), un resumen de las respuestas –de las x% de las respuestas más importantes, de una manera no redundante, el Gold Standard para los resúmenes será extraído automáticamente de las anotaciones manuales teniendo en consideración el parámetro de “intensidad” de la opinión expresada.This evaluation task proposal has been partially supported by Ministerio de Ciencia e Innovación - Spanish Government (grant no. TIN2009-13391-C04-01), and Conselleria d'Educació - Generalitat Valenciana (grant no. PROMETEO/2009/119 and ACOMP/2010/288

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Mapping Nanomedicine Terminology in the Regulatory Landscape

Author: BALAHUR-DOBRESCU ALEXANDRA
BREMER SUSANNE
GOTTARDO STEFANIA
JOANNY GERALDINE
QUIROS PESUDO LAIA
RASMUSSEN KIRSTEN
WAGNER GERHARD
Publication venue: 'Publications Office of the European Union'
Publication date: 20/06/2018
Field of study

A common terminology is essential in any field of science and technology for a mutual understanding among different communities of experts and regulators, harmonisation of policy actions, standardisation of quality procedures and experimental testing, and the communication to the general public. It also allows effective revision of information for policy making and optimises research fund allocation. In particular, in emerging scientific fields with a high innovation potential, new terms, descriptions and definitions are quickly generated, which are then ambiguously used by stakeholders having diverse interests, coming from different scientific disciplines and/or from various regions. The application of nanotechnology in health -often called nanomedicine- is considered as such emerging and multidisciplinary field with a growing interest of various communities. In order to support a better understanding of terms used in the regulatory domain, the Nanomedicines Working Group of the International Pharmaceutical Regulators Forum (IPRF) has prioritised the need to map, compile and discuss the currently used terminology of regulatory scientists coming from different geographic areas. The JRC has taken the lead to identify and compile frequently used terms in the field by using web crawling and text mining tools as well as the manual extraction of terms. Websites of 13 regulatory authorities and clinical trial registries globally involved in regulating nanomedicines have been crawled. The compilation and analysis of extracted terms demonstrated sectorial and geographical differences in the frequency and type of nanomedicine related terms used in a regulatory context. Finally 31 relevant and most frequently used terms deriving from various agencies have been compiled, discussed and analysed for their similarities and differences. These descriptions will support the development of harmonised use of terminology in the future. The report provides necessary background information to advance the discussion among stakeholders. It will strengthen activities aiming to develop harmonised standards in the field of nanomedicine, which is an essential factor to stimulate innovation and industrial competitiveness.JRC.F.2-Consumer Products Safet

JRC Publications Repository

Resource Creation and Evaluation for Multilingual Sentiment Analysis in Social Media Texts

Author: BALAHUR DOBRESCU ALEXANDRA
EL GHALI ADIL
JACQUET GUILLAUME
KUCUK DILEK
PEREA ORTEGA JOSE MANUEL
STEINBERGER Ralf
TURCHI Marco
ZAVARELLA Vanni
Publication venue: European Language Resources Association
Publication date: 22/10/2013
Field of study

Sentiment analysis (SA) regards the classification of texts according to the polarity of the opinions they express. SA systems are highly relevant to many real-world applications (e.g. marketing, eGovernance, business intelligence, behavioral sciences) and also to many tasks in Natural Language Processing (NLP) – information extraction, question answering, textual entailment, to name just a few. The importance of this field has been proven by the high number of approaches proposed in research, as well as by the interest that it raised from other disciplines and the applications that were created using its technology. In our case, the primary focus is to use sentiment analysis in the context of media monitoring, to enable tracking of global reactions to events. The main challenge that we face is that tweets are written in different languages and an unbiased system should be able to deal with all of them, in order to process all (possible) available data. Unfortunately, although many linguistic resources exist for processing texts written in English, for many other languages data and tools are scarce. Following our initial efforts described in (Balahur and Turchi, 2013), in this article we extend our study on the possibility to implement a multilingual system that is able to a) classify sentiment expressed in tweets in various languages using training data obtained through machine translation; b) to verify the extent to which the quality of the translations influences the sentiment classification performance, in this case, of highly informal texts; and c) to improve multilingual sentiment classification using small amounts of data annotated in the target language. To this aim, varying sizes of target language data are tested. The languages we explore are: Arabic, Turkish, Russian, Italian, Spanish, German and French.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Proceedings of the First Workshop on Computing News Storylines (CNewsStory 2015)

Author: ATSERIAS Jordi
BALAHUR-DOBRESCU ALEXANDRA
CASELLI Tommaso
FINLAYSON Mark
MILLER Ben
MINARD Anne-Lyse
VAN ERP Marieke
VOSSEN Piek
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 29/06/2015
Field of study

This volume contains the proceedings of the 1st Workshop on Computing News Storylines (CNewsStory 2015) held in conjunction with the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2015) at the China National Convention Center in Beijing, on July 31st 2015. Narratives are at the heart of information sharing. Ever since people began to share their experiences, they have connected them to form narratives. The study od storytelling and the field of literary theory called narratology have developed complex frameworks and models related to various aspects of narrative such as plots structures, narrative embeddings, characters’ perspectives, reader response, point of view, narrative voice, narrative goals, and many others. These notions from narratology have been applied mainly in Artificial Intelligence and to model formal semantic approaches to narratives (e.g. Plot Units developed by Lehnert (1981)). In recent years, computational narratology has qualified as an autonomous field of study and research. Narrative has been the focus of a number of workshops and conferences (AAAI Symposia, Interactive Storytelling Conference (ICIDS), Computational Models of Narrative). Furthermore, reference annotation schemes for narratives have been proposed (NarrativeML by Mani (2013)). The workshop aimed at bringing together researchers from different communities working on representing and extracting narrative structures in news, a text genre which is highly used in NLP but which has received little attention with respect to narrative structure, representation and analysis. Currently, advances in NLP technology have made it feasible to look beyond scenario-driven, atomic extraction of events from single documents and work towards extracting story structures from multiple documents, while these documents are published over time as news streams. Policy makers, NGOs, information specialists (such as journalists and librarians) and others are increasingly in need of tools that support them in finding salient stories in large amounts of information to more effectively implement policies, monitor actions of “big players” in the society and check facts. Their tasks often revolve around reconstructing cases either with respect to specific entities (e.g. person or organizations) or events (e.g. hurricane Katrina). Storylines represent explanatory schemas that enable us to make better selections of relevant information but also projections to the future. They form a valuable potential for exploiting news data in an innovative way.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Un método de clasificación de opiniones de críticas extraídas de la Web basado en la proximidad semántica

Author: Balahur Dobrescu Alexandra
Montoyo Andres
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2009
Field of study

Los últimos años han marcado el inicio y la rápida expansión de la web social, donde cada persona puede expresar su libre opinión sobre diferentes "objetos", tales como productos, personas, tópicos de política etc. en blogs, foros o portales Web de comercio electrónico. A su vez, el rápido crecimiento del volumen de información en la web ha ido permitiendo a los usuarios la toma de decisiones mejores y más informadas. A raíz de esta expansión ha surgido la necesidad de desarrollar sistemas especializados de PLN que automáticamente escaneen la web en busca de las opiniones expuestas (que recuperen, extraigan y clasifiquen las opiniones existentes dada una consulta). La minería de opiniones (análisis de sentimientos) ha demostrado ser un problema difícil debido a la gran variabilidad semántica del texto libre. En este artículo se propone un método para extraer, clasificar y resumir opiniones sobre productos concretos utilizando críticas realizadas en la Web. El método se basa en una taxonomía de características de productos previamente construida, el cálculo de la proximidad semántica entre conceptos por medio de la Distancia Normalizada de Google y el método de aprendizaje automático SVM. Finalmente, demostramos que nuestro enfoque supera los resultados base de la tarea y ofrece una alta precisión y una alta confianza en las clasificaciones obtenidas.Recent years have marked the beginning and rapid expansion of the social web, where people can freely express their opinion on different “objects”, such as products, persons, topics etc. on blogs, forums or e-commerce sites. While the rapid growth of the information volume on the web allowed for better and more informed decisions from users, its expansion led to the need to develop specialized NLP systems that automatically mine the web for opinions (retrieve, extract and classify opinions of a query object). Opinion mining (sentiment analysis) has been proven to be a difficult problem, due to the large semantic variability of free text. In this article, we propose a method to extract, classify and summarize opinions on products from web reviews, based on the prior building of product characteristics taxonomy and on the semantic relatedness given by the Normalized Google Distance and SVM learning. We prove that our approach outperforms the baselines and has a high precision and classification confidence

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas